Multiple change point analysis of hepatitis B reports in Xinjiang, China from 2006 to 2021

Objective Hepatitis B (HB) is a major global challenge, but there has been a lack of epidemiological studies on HB incidence in Xinjiang from a change-point perspective. This study aims to bridge this gap by identifying significant change points and trends. Method The datasets were obtained from the Xinjiang Information System for Disease Control and Prevention. Change points were identified using binary segmentation for full datasets and a segmented regression model for five age groups. Results The results showed four change points for the quarterly HB time series, with the period between the first change point (March 2007) and the second change point (March 2010) having the highest mean number of HB reports. In the subsequent segments, there was a clear downward trend in reported cases. The segmented regression model showed different numbers of change points for each age group, with the 30–50, 51–80, and 15–29 age groups having higher growth rates. Conclusion Change point analysis has valuable applications in epidemiology. These findings provide important information for future epidemiological studies and early warning systems for HB.


. Introduction
Hepatitis B (HB) is a serious and prevalent disease caused by the hepatitis B virus (HBV) (1,2).It is a leading cause of hepatic decompensation, cirrhosis, and hepatocellular carcinoma (3).According to the World Health Organization (WHO), in 2019, approximately 296 million people were living with chronic HB infections, and 1.5 million new infections occurred each year (4).HBV can be transmitted through blood and other bodily fluids, including saliva, tears, semen, and vaginal secretions.The most common modes of transmission include transmission from mother to child during delivery, unsafe injections, and sexual contact with an infected partner (4).
In China, HB has topped the list of category B infectious diseases in the annual report of nationally notifiable infectious diseases from the Chinese Center for Disease Control and Prevention.Xinjiang, located in the northwest of China, which is considered a highprevalence area for HB (5)(6)(7).Therefore, conducting epidemiological studies in this region is critically important for the prevention and control of this disease.
The Change Point Analysis (CPA) method can be utilized to investigate whether one or more changes occur within a series of data points and if there are significant trends in time series data (8,9).The CPA is widely applied in various scientific fields, including bioinformatics (10,11), financial modeling (12), hydrology (13), climatology (14), and others.Notably, in the field of epidemiology, CPA can effectively quantify the burden caused by the infection of a particular disease and identify the outbreak of certain diseases (15), thus aiding in the implementation of suitable prevention and control measures to prevent further outbreaks in the field of epidemiology (16,17), where the change points can elucidate the pattern and infection cycle of an epidemic (18).
To gain a better understanding of the epidemiology of HB, it is crucial to investigate the change points and trend changes in HB time series data.Therefore, this study aims to achieve three objectives: (i) to discuss the epidemiological characteristics of HB from the perspective of change points, (ii) to identify the change points and trend changes in the full HB time series data, and (iii) to identify the change points and trend changes in the time series data of five different age groups.
. Materials and methods

. . Study area and data source
Xinjiang is located at 74 • -96 • east longitude and 34 • -49 • north latitude in the northwest of China and covers an area of approximately 1,664,900 km 2 .As of the end of 2022, the permanent resident population in Xinjiang amounted to 25.78 million.Our study utilized data on daily HB cases in Xinjiang (excluding the Xinjiang production and construction corps) from 1 January 2006 to 31 December 2021, which was obtained from the Xinjiang Information System for Disease Control and Prevention.This dataset contains information on the patient's age, place of residence, onset date of symptoms, time of symptom confirmation, and confirmation address of symptoms.Based on the daily onset time series, we generated monthly, quarterly, and annual onset time series.As the data used for this analysis did not involve any private patient information, no ethical approval or informed consent was deemed necessary.

. . The method of CPA . . . The binary segmentation technique
The binary segmentation technique, a likelihood-based approach, was employed in order to detect the changes in the HB time series, where the count data of HB reports in Xinjiang is assumed to follow the Poisson distribution.Let y 1 : n = (y 1 , . . ., y n ) is a time series.A change point is said to occur in the datasets if there exists a time τ ∈ {1, . . ., n − 1} lead to the differently statistical properties between {y 1 , . . ., y τ } and {y τ +1 , . . ., y n }.Extending the idea to multiple change points, the number of k changes occur at τ 1 : k ∈ {τ 1 , . . ., τ k }, and each change point position is an integer between 1 and n − 1 inclusive, where let τ 0 = 0 and τ K+1 = n without loss of generality.The most common approach in identifying multiple change points is to minimize where C is a cost function for a segment (e.g., negative loglikelihood) and βf k is a penalty to guard against overfitting (19).The cpt.meanvar function in the R package "changepoint" and the binary segmentation technique (19,20) were utilized in order to search for multiple change points.The cpt.meanvar is a practical tool to detect both mean and variance changes.Binary segmentation is aimed at estimating an approximate minimum of Equation (1).Once the change points were identified, the corresponding segments could still be represented, such as the ith (i = 1, . . ., k + 1) segment, which could be found between the (i − 1)th and ith change points.

. . . Segmented regression model
The non-linear function estimates connected through two, three, or more straight lines at unknown points are referred to as change points, breakpoints, or join points.Let y t be the cumulative number of reported HB cases at time t = 1, 2, . . ., n.A relationship between the mean response E y t and the explanatory variable t is explained by adding the linear term of the model.Then, the segmented model is described as follows: Here, we assume there are K + 1 different regimes with slopes α 1 , α 2 = α 1 + γ i , and α K = α 1 + k 1 γ i .Then, we calculate the percent growth rate by r k = exp {α k } − 1 for each segment k = 1, 2, . . ., K + 1.In addition, we can also report the doubling time, d k = log (2) /α k , also for each regime; this is a parameter to express the number of times requested to double the number of cases.All the model parameters, including the breakpoints, can be estimated by Poisson likelihoods or quasi-likelihoods (21).The Bayesian Information Criterion (BIC) can be used to choose a better model when several segmented models have been fitted with observed data.The analysis has been performed on R (version 4.3.0)using the "segmented package".

. Result . . Data processing and analysis
The data for this study were recorded with EXCEL 2022, and the CPA process was performed by the R (version 4.3.0)software.The significant level is 0.05.

. . HB incidence reports
In total, 670,681 HB cases were reported from 1 January 2006 to 31 December 2021 in Xinjiang, China.Figures 1, 2 show the study area and annual cases of HB reported in each region of Xinjiang, Frontiers in Public Health frontiersin.org

FIGURE
The study area and the number of HB reports of Xinjiang, China, from to .

FIGURE
Annual cases of HB in each region of Xinjiang, China, from to .
China.These two figures indicated that there were more reports of HB in the southern and central regions of Xinjiang.In general, the cases of HB in each region showed a downward trend year by year, but the number of cases still undulated in some regions, like Urumqi, Kashgar, and Aksu. Figure 3 illustrates the incidence of HB across all age groups.It was evident from the figure that HB onset varies significantly by age.Therefore, the population was separated into five age groups (0-14, 15-29, 30-50, 50-80, and 80+), taking into account the age differences and HB prevention and control policies implemented in China.Figure 4 displays the cases of HB for five age groups in Xinjiang, China, from 2006 to 2021.Most HB cases typically occurred in groups 30-50, 15-29, and 50-80, the sum of cases (641,469) in these three groups exceeded 95.6% of the total cases.The cases in each group also showed a decreasing trend with several fluctuations.

. . Change points in the time series data of HB
In Xinjiang, the quarterly HB time series from 2006 to 2021 was analyzed, and four change points (orange dots) were detected using   1.In Figure 6, the CPA results corresponding to Table 1 are presented, shown as connecting points linked by lines.These lines are considered segments where the trends are similar and are identified by different colors.The incidence rates with the 95% confidence interval (95% CI) are calculated for each segment and shown in Figure 6.
Moreover, doubling time (DT) is a parameter of interest that is often reported by epidemiologists and health professionals.Table 2 reports the DT with the 95% confidence interval (95% CI) for every segment.It is quite an easy-to-understand parameter that a shorter DT indicates a higher report of disease.It should be noted that the time series of quarterly HB data used here should take seasonal factors into account.It is found that the doubling times in the first segment are short for the five age groups.The results show a high incidence rate in this segment, especially for the groups of 30-50 and 51-80.However, the DT for each group tends to increase, which is undoubtedly a good signal in epidemiological terms.
In Table 3, we summarize the piecewise trends of different age groups by means of the average percent change, which is calculated as the average of the slopes weighted by the corresponding interval width (22).We report the average growth rate (AGE = e Est.− 1) over the entire period.Here, the results show that the age groups of 30-50, 51-80, and 15-29 reflect the higher growth rate.

. Discussion
Despite the implementation of universal HBV vaccination programs in the 1990s, which has helped many countries significantly reduce the incidence of acute HB and the prevalence of chronic carriers of HB surface antigen (HBsAg), HB remains a major global public health concern.This is particularly true in developing countries and rural areas where HBV is widespread.With its vast territory and large population, the prevention and control of HB remain an important public health issue in Xinjiang, China.Detecting significant change points in the Xinjiang HB report time series provides valuable epidemiological information, particularly in the temporal dimension (23).
Our results provided fundamental information about HB, such as its geographic distribution and age composition.More HB cases were reported in the southern and central cities of Xinjiang.The high number of reported HB cases in southern Xinjiang might be related to low awareness and vaccination rates among the population, as well as inadequate or suboptimal medical management of HB cases.In Urumqi, the central city of Xinjiang, the high incidence might be attributed to various factors.As the capital city and economic and cultural center of Xinjiang, Urumqi attracted a large number of migrant workers, farmers, left-behind children, and other population groups with potentially high rates of HB infection.Additionally, it can also be attributed to the  well-established HB testing and reporting systems in the region, which may ensure timely and accurate diagnosis and reporting.The declining trend of HB confirmation in several regions and age groups in Xinjiang (Figures 3, 4) might be related to the effectiveness of HB control measures and treatment programs in China (24).The specific change points and trend changes were reported using the CPA approach (Figure 5).The results showed a significant increase in reported HB cases in Xinjiang from 2006 to 2007.This increase could be attributed to the gradual improvement of the disease surveillance system, which led to improved accuracy in data collection.In segment 2 (from March 2007 to March 2010) and segment 3 (from March 2010 to June 2019), the HB reports remained at a high level for a long time.This suggested that Xinjiang was indeed one of the areas with the highest incidence of HB.However, the numbers were generally declining from 2010 to 2021, which might be due to the improvement of health prevention awareness among the population and an increase in HB vaccination rates among the population.However, it was worth noting that blind optimism should not be attached to the low number of reported HB cases since 2020.Undoubtedly, the sudden global infectious disease, COVID-19, had a significant impact on the reported number of HB cases beginning in 2020 (25).
According to the results of the segmented regression model, the age groups of 30-50 and 51-80 experienced a higher growth rate, as shown by a smaller DT ( For newborns, strict adherence to the administration of the HB vaccine is necessary.Young people should also adopt a healthy lifestyle to reduce the risk factors associated with the disease.Regular physical examinations for older adults can help identify the disease at its earliest stage.Moreover, health education on HB is pivotal for all age groups.The public should be well-informed about HB, including transmission and prevention measures.Health authorities should proactively raise awareness and understanding of the disease in communities, schools, and workplaces.Access to HB testing and treatment should be readily available.Health facilities should be adequately equipped with resources and trained healthcare professionals to diagnose and treat the disease.Affordable treatment options must be made available to ensure everyone has equal access to treatment.Additionally, continuous research on HB prevention and treatment is also crucial; this includes the development of new therapies and improved vaccines to better combat HB.
In summary, HB prevention and control involve a comprehensive approach targeting different age groups.It requires collaboration between health authorities, healthcare professionals, and the public.By implementing these recommendations, we can effectively reduce the burden of HB and ensure a healthy future for all.In the context of future HBV prevention and control, it is possible to utilize more precise change point detection methods to study the epidemiological characteristics of HBV itself.Furthermore, the analysis can be conducted to evaluate the effectiveness of disease prevention and control measures.

. Conclusion
Investigating change points and trend changes can facilitate informed decisions regarding the prevention of further disease outbreaks, as they can elucidate the pattern and infection cycle of an epidemic.Despite reports of HB decreasing during the CPA process, it persists as a significant public health problem in Xinjiang.As China undergoes rapid development, the social circles of young people are expanding, and unhealthy lifestyle choices such as smoking, irregular rest schedules, and drinking are becoming more prevalent.These behaviors may contribute to the burden of HB and increase its incidence rate in the future.We therefore urge local public health departments to prioritize the prevention and control of HB.Society as a whole should promote healthier lifestyles, and supervision of HB vaccination for newborns in rural and remote areas should remain a focus.Furthermore, the study highlights the differing growth rates of HB among age groups, further emphasizing the significance of adopting age-specific prevention and control measures to mitigate its transmission and spread.

FIGURE
FIGUREAnnual cases of HB for all ages in Xinjiang, China, from to .
TABLE The BIC value of CPA in each sequence of five age groups in Xinjiang.
among age groups as the best model was selected based on the lowest BIC (bold values in Table1), with the BIC values reported in Table TABLE The doubling time and the % confidence interval ( % CI) of each segment for five age groups in Xinjiang.

Table 2 )
and higher values of AGE (Table3).Unhealthy lifestyle choices, such as excessive workload, life pressures, frequent late nights, inappropriate work practices, TABLE The means of the average percent change (Est.) and the AGE for each sequence of the five age groups in Xinjiang.